Travel Package Purchase Prediction

Assignment details

https://olympus.greatlearning.in/courses/40608/assignments/123646

Objective

To predict which customer is more likely to purchase the newly introduced travel package.

Customer details:

Customer interaction data:

Load Dataset

Head and tail to see what is up with the DS

Exploring the DS

Check the data types of the columns for the dataset.

Observations

Sanitizing the data - find null, duplicates, irrelevant columns

Observations

Further explorations to see if we can sanitize further the DS

Null vals in TypeOfContact

There are Null Values in DurationOfPitch variable

Look at that typo in Fe Male - need to fix that

Check for missing values

EDA

Univariate analysis

Observations:

Observations:

Observations

Analyze categorical variables

Observations:

Observations:

Observations:

Observations:

Observations:

Observations:

Bivariate analysis

Observations:

Key Observations -

Relation between ProductPitched (target variable)vs other independent variables

Observations

Observations from EDA

Fix missing values

Create Dummy variables

Observations

ZERO!! Hoho!!

Outliers detect and annihilate maybe?

Treating Outliers

Observations

Bye bye outliers!

Build thee Model

Split the data into train and test sets

Build Decision Tree Model

Observations

Observation

Top features are : Passport,Designation_Executive, Age, CityTier,Preferred PropertyStar

Building various ensemble models

Bagging Classifier

We see that BaggingClassifier is overfitting the training data

Random Forest Classifier

Observations:

Tune Hyperparameter to get better results

Observations

Insights

Random Forest Classifier - HyperTuning

Observations

Observations

Checking the feature importance¶

Observations

Top 3 features: Passport, Age and Duration of Pitch

Compare all 34 models

Build boost models

AdaBoost Classifier

Gradient Boosting Classifier

XGBoost Classifier

Observations

Hyperparameter Tuning - AdaBoost Classifier

Observations

MonthlyIncome is the most important feature as per the tuned AdaBoost model.

Gradient Boosting Classifier

Let's try using AdaBoost classifier as the estimator for initial predictions

Observations

MonthlyIncome is the most important feature, followed by Age and DurationOfPitch, as per the tuned gradient boosting model

XGBoost Classifier

Observations

Passport is the most important feature as per XGBoost model unlike AdaBoost and Gradient Boosting, where the most important feature is the MonthlyIncome.

Comparing all models

Recommendation

References

[1] https://josephbalog.com/wp-content/uploads/2021/07/JBalog_ET_Project_GTTravel_Package_Purchase_Prediction.pdf [2] https://www.kaggle.com/jordanrich/predictive-sales-operations-pre-launch-modelling [3] https://github.com/SrujanaBandla/Ensemble-Techniques-Travel-Package-Purchase-Prediction